Diachronic vocabulary adaptation for b
نویسنده
چکیده
This article investigates the use of Internet news sources to automatically adapt the vocabulary of a French and an English broadcast news transcription system. A specific method is developed to gather training, development and test corpora from selected websites, normalizing them for further use. A vectorial vocabulary adaptation algorithm is described which interpolates word frequencies estimated on adaptation corpora to directly maximize lexical coverage on a development corpus. To test the generality of this approach, experiments were carried out simultaneously in French and in English (UK) on a daily basis for the month May 2004. In both languages, the OOV rate is reduced by more than a half.
منابع مشابه
How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first...
متن کاملProper name retrieval from diachronic documents for automatic speech transcription using lexical and temporal context
Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assump...
متن کاملRepresenting Polysemy and Diachronic Lexico-Semantic Data on the Semantic Web ?
In this article we will outline two different vocabularies, both extensions of the lemon model, for representing diachronic lexicosemantic data on the Semantic Web. This is especially useful for representing the evolution of scientific terminologies where many terms are polysemous and or imported from other languages. The first vocabulary, polyLemon, allows for the representation of data about ...
متن کاملMorphosemantic fields in the analysis of Croatian vocabulary
This paper presents the morphosemantic field model, claiming that it is relevant in the description of lexical structures in grammatically-motivated languages such as Croatian. Arguments are presented for the applicability of the model in synchronic and diachronic lexical analysis. The fact that many characteristics of morphosemantic fields are compatible with the theoretical framework of cogni...
متن کاملOptimality and diachronic adaptation
In this programmatic paper, I argue that the universal constraints of Optimality Theory (OT) need to be complemented by a theory of diachronic adaptation. OT constraints are traditionally stipulated as part of Universal Grammar, but this misses the generalization that the grammatical constraints normally correspond to constraints on language use. As in biology, observed adaptive patterns in lan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005